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ABSTRACT 

Evolutionary algorithms (EAs) form a popular optimisation 
paradigm inspired by natural evolution. In recent years the 
field of evolutionary computation has developed a rigorous 
analytical theory to analyse their runtime on many illustra¬ 
tive problems. Here we apply this theory to a simple model 
of natural evolution. In the Strong Selection Weak Mutation 
(SSWM) evolutionary regime the time between occurrence 
of new mutations is much longer than the time it takes for a 
new beneficial mutation to take over the population. In this 
situation, the population only contains copies of one geno¬ 
type and evolution can be modelled as a (l+l)-type process 
where the probability of accepting a new genotype (improve¬ 
ments or worsenings) depends on the change in fitness. 

We present an initial runtime analysis of SSWM, quantify¬ 
ing its performance for various parameters and investigating 
differences to the (1+1) EA. We show that SSWM can have 
a moderate advantage over the (1+1) EA at crossing fitness 
valleys and study an example where SSWM outperforms the 
(1+1) EA by taking advantage of information on the fitness 
gradient. 

Categories and Subject Descriptors 

F.2.2 [Analysis of Algorithms and Problem Complex¬ 
ity]: Nonnumerical Algorithms and Problems 

Keywords 

Runtime analysis, natural evolution, population genetics, 
theory, strong selection weak mutation regime 

1. INTRODUCTION 

In the last 20 years evolutionary computation has devel¬ 
oped a number of algorithmic techniques for the analysis of 
evolutionary and genetic algorithms. These methods typ¬ 
ically focus on runtime, and allow for rigorous bounds on 
the time required to reach a global optimum, or other well- 
specified high-fitness solutions. The runtime analysis of evo¬ 
lutionary algorithms has become one of the dominant con¬ 
cepts in evolutionary computation, leading to a plethora of 
results for evolutionary algorithms mum as well as novel 


optimisation paradigms such as swarm intelligence [14] and 
artificial immune systems [9]. 

Interestingly, although evolutionary algorithms are heav¬ 
ily inspired by natural evolution, these methods have seldom 
been applied to natural evolution as studied in mathemat¬ 
ical population genetics. This is a missed opportunity: the 
time it takes for a natural population to reach a fitness peak 
is an important question for the study of natural evolution. 
The kinds of results obtained from runtime analysis, namely 
how the runtime scales with genome size and mutation rate, 
are of general interest to population genetics. Moreover, 
recently there has been a renewed interest in applying com¬ 
puter science methods to problems in evolutionary biology 
with contributions from unlikely fields such as game the¬ 
ory machine learning HI] and Markov chain theory [3J. 
Here, we present a first attempt at applying runtime analy¬ 
sis to the so-called Strong Selection Weak Mutation regime 
of natural populations. 

The Strong Selection Weak Mutation model applies when 
the population size, mutation rate, and selection strength 
are such that the time between occurrence of new mutations 
is long compared to the time a new genotype takes to replace 
the parent genotype [6]. Under these conditions, only one 
genotype is present in the population most of the time, and 
evolution occurs through “jumps” between different geno¬ 
types, corresponding to a new mutation replacing the res¬ 
ident genotype in the population. The relevant dynamics 
can then be characterized by a (l+l)-type stochastic pro¬ 
cess. This model is obtained as a limit of many other models, 
such as the Wright-Fisher model. One important aspect of 
this model is that new solutions are accepted with a proba¬ 
bility 2 iv$lf that depends on the fitness difference A/ 
between the new mutation and the resident genotype. Here 
N reflects the size of the underlying population, and /3 rep¬ 
resents the selection strength. One can think of / as defin¬ 
ing a phenotype that is under selection to be maximized; /3 
quantifies how strongly a unit change in / is favoured. This 
probability was first derived by Kimura for a population 
of N individuals that are sampled binomially in proportion 
to their fitness. 

This choice of acceptance function introduces two main 
differences to the (1+1) EA: First, solutions of lower fit- 



ness (worsenings) may be accepted with some positive prob¬ 
ability. This is reminiscent of the Metropolis algorithm 
(Simulated Annealing with constant temperature) which can 
also accept worsenings (see, e. g. ESI). Second, solutions of 
higher fitness can be rejected, since they are accepted with 
a probability that is roughly proportional to the relative ad¬ 
vantage they have over the current solution. 

We cast this model of natural evolution in a (l+l)-type 
algorithm referred to as SSWM, using common mutation 
operators from evolutionary algorithms. We then present 
first runtime analyses of this process. Our aims are manifold: 

• to explore the performance of natural evolution in the 
context of runtime, comparing it against simple evolu¬ 
tionary algorithms like the (1+1) EA, 

• to investigate the non-elitistic selection mechanism im¬ 
plicit to SSWM and its usefulness in the context of 
evolutionary algorithms, and 

• to show that techniques for the analysis of evolutionary 
algorithms can be applied to simple models of natural 
evolution, aiming to open up a new research field at 
the intersection of evolutionary computation and pop¬ 
ulation genetics. 

Our results are summarised as follows. For the simple 
function OneMax we show in Section [3] that with suitably 
large population sizes, when N/3 > | ln(lln), SSWM is an 
effective hill climber as it optimises OneMax in expected 
time 0((n log n)//3). However, when the population size is 
by any constant factor smaller than this threshold, we en¬ 
counter a phase transition and SSWM requires exponential 
time even on OneMax. 

We then illustrate the particular features of the selection 
rule in more depth. In Section [4] we consider a function 
Cliff^ where a fitness valley of Hamming distance d needs 
to be crossed. For d = w(logn) the (1+1) EA needs time 
O [n d ), but SSWM is faster by a factor of e n( ' d ' > because of 
its ability to accept worse solutions. Finally, in Section [5] 
we illustrate on the function Balance [18] that SSWM can 
drastically outperform the (1+1) EA because the fitness- 
dependent selection drives it to follow the steepest gradient. 
While the (1+1) EA needs exponential time in expectation, 
SSWM with overwhelming probability finds an optimum in 
polynomial time. 

The main technical difficulties are that in contrast to the 
simple (1+1) EA, SSWM is a non-elitist algorithm, hence 
fitness-level arguments based on elitism are not applicable. 
Level-based theorems for non-elitist populations [4] are not 
applicable either because they require population sizes larger 
than 1. Moreover, while for the (1+1) EA transition proba¬ 
bilities to better solutions are solely determined by probabil¬ 
ities for flipping bits during mutation, for SSWM these addi¬ 
tionally depend on the probability of fixation and hence the 
absolute fitness difference. The analysis of SSWM is more 
challenging than the analysis of the (1+1) EA, and requires 
tailored proof techniques. We hope that these techniques 
will be helpful for analysing other evolutionary algorithms 
with fitness-based selection schemes. 

2. PRELIMINARIES 

We define the optimisation time of SSWM as the first 
generation where the optimum is accepted as new individual. 


As can be seen from the description above, the model 
resembles the ( 1 + 1 ) EA in that it only maintains one geno¬ 
type that may be replaced by mutated versions of it. The 
candidate solutions are accepted with probability 

1 _ -2/3A/ 

= 1 _ e -2N P Af 

where A/ 7 ^ 0 is the fitness difference to the current solution 
and N > 1 is the size of the underlying population. For 
A/ = 0 we define pfi x ( 0 ) := limA/->oPfix(A/) = jj, so that 
Pflx is continuous and well defined for all A/. If N = 1, this 
probability will be pu x (s) = 1 , meaning that any offspring 
will be accepted, and if N —> oo, it will only accept solution 
for which A / > 0. This expression was first derived by 
Kimura [T2| and represents the probability of fixation, that 
is, the probability that a gene that is initially present in one 
copy in a population of N individuals is eventually present 
in all individuals. 

Since the acceptance function in this algorithm depends 
on the absolute difference in fitness between genotypes, we 
include a parameter ft £ ( 0 , 1 ] that effectively scales the fit¬ 
ness function and that in population genetics models the 
strength of selection on a phenotype. By incorporating fi as 
a parameter of this function (and hence of the algorithm) 
we avoid having to explicitly rescale the fitness functions 
we analyse, while allowing us to explore the performance of 
this algorithm on a family of functions. This function has 
a sigmoid shape (strictly increasing - see Lemma 1151 with 
limits limA/-»-ooPfl x (A/) = 0 and lim A /^ooPfl x (A/) = 1. 
As such, for large \/3Af\ this probability of acceptance is 
close to the one in the (1+1) EA, as long as N > 1, de¬ 
feating the purpose of the comparison. By bounding /3 to 1, 
we avoid artefactual results obtained by inflating the fitness 
differences between genotypes. 

We can then cast the SSWM regime as Algorithm]]] where 
the function mutate(:r) can be either standard bit muta¬ 
tion (all bits are mutated independently with probability 
p m = 1/n, which we call global mutations) or flipping a 
single bit chosen uniformly at random (which we call local 
mutations ). SSWM is valid when the expected number of 
new mutants in the population is much less than one, which 
implies that local mutations are a better approximation for 
this regime. However, we also consider global mutations in 
order to facilitate a comparison with evolutionary algorithms 
such as the (1+1) EA (Algorithm [2]), which uses global mu¬ 
tations. 


Algorithm 1 SSWM 

Choose x £ (0, l} n uniformly at random 
repeat 

y +- mutate(:r) 

Af = f{y ) - f{x) 

Choose r £ [0,1] uniformly at random 
if r < pfi x (A/) then 

x <— y 

end if 
until stop 


Next, we derive upper and lower bounds for pfi x (A/) that 
will be useful throughout the manuscript. 






Algorithm 2 (1+1) EA 

Choose x £ (0, 1}" uniformly at random 

repeat 

y <— mutate(iE) 
if f(y) > f(x) then 
x*-y 

end if 
until stop 


Lemma 1. For every 0 £ R + and N £ N + the following 
inequalities hold. If A/ > 0 then 


20A / 

1 + 20 A f 
If A/ < 0 then 

-20A f 

g-2JV/3A/ 


< Pfix(A/) < 


< Pflx(A/) < 


20A f 

l _ e -2N/3Af ' 


e -2/3A/ 
e -2JV/3A/ _ 1 ' 


Proof. In the following we frequently use 1 + x < e x and 
1 — e~ x < 1 for all x £ R as well as e x < M— for x < 1. 

If x > 0, 


as well as 


i —2/3A/ 

= ! _ g—2iV/3A/ > 1 - e_2,3A/ 

> 1 1 2 P A f 

1 + 2/3 Af 1 + 2/3 Af 

l-e“ 2/?A/ „ 20 A f 

PfixfiiJ j _ e -2NpAf — l _ g-2iV/3A/ ' 

If A/ < 0, 

g—2/3A/ _ 1 e ~ 2pAf 

Pflx(A/) — e _2N/3Af _ | — g-2iV/3A/ _ X ' 

Using the fact that e -1 — 1 < e 


1 ^ e“ 2/3A/ - 1 ^ —2f3Af 

. X - g-2JV/3A/ - e -2NpAf ' 


e -2/3A/ _ 1 e _2/3A/ - 1 

x(A/) = e ~2NpAf _ 1 — P-2JV/3A/ 


e previous bounds 

vrtirmal t.n t.ViP flt.nf 


fr»r 


3. SSWM ON ONEMAX 

The function OneMax(i) := h as been studied 

extensively in natural computation because of its simplic¬ 
ity. It represents an easy hill climbing task, and it is the 
easiest function with a unique optimum for all evolutionary 
algorithms that only use standard bit mutation for varia¬ 
tion [20]. Showing that SSWM can optimise OneMax effi¬ 
ciently serves as proof of concept that SSWM is a reasonable 
optimiser. It further sheds light on how to set algorithmic 
parameters such as the selection strength 0 and the popula¬ 
tion size N. To this effect, we first show a polynomial upper 
bound for the runtime of SSWM on OneMax. We then 
show that SSWM exhibits a phase transition on its runtime 
as a function of N/3\ changing this parameter by a constant 
factor leads to exponential runtimes on OneMax. 

Another reason why studying OneMax for SSWM makes 
sense is because not all evolutionary algorithms that use 


a fitness-dependent selection perform well on OneMax. 
Oliveto and Witt m showed that the Simple Genetic Algo¬ 
rithm, which uses fitness-proportional selection, fails badly 
on OneMax even within exponential time, with a very high 
probability. 

3.1 Upper Bound for SSWM on OneMax 

We first show the following simple lemma, which gives an 
upper bound on the probability of increasing or decreasing 
the number of ones in a search point by k in one mutation. 


Lemma 2. For any positive integer k > 0, let mut(i, i±fc) 
for 0 < i < n be the probability that a global mutation of a 
search point with i ones creates an offspring with i3zk ones. 
Then 


mut(i, i + k) < 


mut(i, i — k) < 



( _ iy~ k 1+4 

n) k\ 



/ _ 1\ n ~ k 114 
\ ~n) ' ~kT ' 


The proof is omitted due to space restrictions; it uses ar¬ 
guments from the proof of Lemma 2 in [20]. The second 
inequality follows immediately from the first one due to the 
symmetry mut(«, i — k) = mut(n — i, n — i + k). 

Now we introduce the concept of drift and find some 
bounds for its forward and backward expression. 


Definition 1. Let Xt be the number of ones in the cur¬ 
rent search point, for all 1 < i < n the forward and backward 
drifts are 

A +(i) = E[X t+1 - i | X t = i, X t+1 > i] ■ P( X t+1 >i\X t =i) 
A” (i) = E[X t +1 - i | X t = i, Xt+i < i] ■ P(X t+ i <i\X t = i ) 

and the net drift is the expected increase in the number of 
ones 


A(i) = A + (i) + A (i). 


Lemma 3. Consider SSWM on OneMax and mutation 
probability p m = i. Then for global mutations, the forward 
and backward drift-s can be bounded by 


A + (i) > 


1 - - 
n 

1 


Pfix(l) 


|A - (i)| < 1.14 ( 1 — — ) • (pfi x (—1) + e • pfl x (—2)). 


For local mutations the relations are as follows 

. + ... n — i . 

A + (i) =- -Pflx(l) 

n 

|A“(i)| < Pfix(-l). 


Proof. For global mutations firstly we compute the lower 
bound for the forward drift, 

n — i 

A+ 00 = mut (b i + Pflx(i) 

3 = 1 

where mut(i, i + j) is the probability of mutation increasing 
the OneMax value by j and i is the number of ones of the 
current search point. 






















We need a positive net drift even in the last step (n—i = 1) 


A + (i)> mut(i, j + 1) • p fix (l) 

n-i ( 1V 1 , x 

> - 1- Pflx(l). 

n \ n J 

Secondly we calculate the upper bound for the backward 
drift 

i 

3 =1 

where j is now the number of new zeros. We can upper 
bound mut(i, i — j ) for the probability of flipping any j bits, 
which from Lemma O yields 


A (n — 1) > — 
e 

>1 


1 

> - 

e 




lln - 1 

1 - e" 2 ' 3 


lln - 1 


(1 - e~ 2p ) - 1.14(e 2p - 1) 


2/3 


1 e 2/3 — 1 

lln - 1 V 11 n L14 ' 1-e- 2 / 3 


using the relation e x = - x 


> i 

e 


1 - e“ 2/5 

-TT- r ( 11- - -1.14 -e- 

lln — 1 V n 


2/3 


^ --<-/)■ 

Separating the case j = 1 and bounding the remaining fixa¬ 
tion probabilities by pfi x (—2) 


1 


< 1.14 1 - - Pfix(-l) 


, \ n —1 i . 


1 


n—1 


<1.14(1--) (psx(—1) + e • pfl x (—2)). 


since /J e (0,1] then e 2p < e 2 < 7.5 


> - 

e 


1 

lln — 1 \ n 


> 


1.5 1 — e~ 2p 


e lln — 1 
also for /3 £ (0,1] we have 1.5(1 — e~ 2p ) > /3 
1 


>£. 


e lln — 1 

which is positive for enough large n. 

Therefore we can lower bound the drift in any point as 


Finally, the case for local mutations is straightforward since 
the probability of a local mutation increasing the number of 
ones is and that of decreasing it is at most 1. □ 

The following theorem shows that SSWM is efficient on 
OneMax for N/3 > ^ln(lln), since then ph x (1) starts be¬ 
ing greater than n-pfi x (— 1) allowing for a positive drift even 
on the hardest fitness level (n — 1 ones). The upper bound 
increases with 1//3; this makes sense as for small values of /3 
we have pa x (1) ~ 2 j3 (cf. Lemma [Tj) . In this regime abso¬ 
lute fitness differences are small and improvements are only 
accepted with a small probability. 

Theorem 4. For N/3 > | ln(lln) and /3 £ (0,1], the ex¬ 
pected optimisation time of SSWM on OneMax with local or 

global mutations is O ( nl ° Bn J for every initial search point. 


Proof. The fixation probabilities can be bounded as follows 

„-2 P 


1 - e~ 2f) 

Pfix(l) = -- rrrrs > 1 - e 


! - e _2JV/3 - 


and for N/3 > \ ln(lln) 


Pfix( 1) — 


Pflx(—2) = 


f> 


- 1 


e 2p — 1 


e 2/v/3 _ 1 - lln - 1 


< 


( 2 ) 


»4/3 


- 1 


e 4/3 — 1 

e 4/v/3 _ 1 - (Hn)2 _ 1 


< 


= 0(n 2 ). 


Using Lemma [3] 


A W > i 



.(l-e- 2 ' 3 ) 


1.14 


e 2 ^ — 1 

lln - 1 


0(n~ 2 ) 


A(t) = LI (3) 

Now we apply Johannsen’s variable drift theorem P3] to 
the number of zeros. Using h(z) := E(Xt — Xt+i \ Xt = z) 
then 


E(T | X 0 ) < 


Zmin 

homin') 




where z is the number of zeros, Xt the current state and T 
the optimisation time. Introducing z m in = 1, Xo = n and 

A(*) = n(--/s) =h(z) 
we obtain an upper bound for the runtime 

E{T '' x ^-ik + [w) dz ’ oi 3) +0 {[TN) 

3.2 A Critical Threshold for SSWM on One- 
Max 

The upper bound from Theorem [4] required N/3 > 
2 ln(lln) = i ln(n) + 0(1). This condition is vital since if 
N/3 is chosen too small, the runtime of SSWM on OneMax 
is exponential with very high probability, as we show next. 

If N[3 is by a factor of 1 — e, for some constant e > 0, 
smaller than i In n, the optimisation time is exponential 
in n, with overwhelming probability. SSWM therefore ex¬ 
hibits a phase transition behaviour: changing N/3 by a con¬ 
stant factor makes a difference between polynomial and ex¬ 
ponential expected optimisation times on OneMax. 





































Theorem 5. If 1 < N/3 < L-S. Inn for some 0 < e < 1, 
then the optimisation time of SSWM with local or global 
mutations on OneMax is at least 2 cn / with probability 
1 — 1 \ for some constant c > 0. 

The idea behind the proof of Theorem [5] is to show that 
for all search points with at least n — n e ^ 2 ones, there is a 
negative drift for the number of ones. This is because for 
small N/3 the selection pressure is too weak, and worsenings 
in fitness are more likely than steps where mutation leads 
the algorithm closer to the optimum. 

We then use the negative drift theorem with self-loops 
presented in Rowe and Sudholt [Si (an extension of the 
negative drift theorem [16] to stochastic processes with large 
self-loop probabilities). It is stated in the following for the 
sake of completeness. 


Theorem 6 (Negative drift with self-loops [Hi]). 
Consider a Markov process Xo,Xi, ... on {0,..., m} and 
suppose there exists integers a, b with 0 < a < b < m and 
e > 0 such that for all a < k < b the expected drift towards 0 
is 


E(k - X t+ 1 | X t = k) < -e ■ (1 - p fc|fc ) 

where pk,k is the self-loop probability at state k. Further 
assume there exists constants r,S > 0 (i. e. they are inde¬ 
pendent of m) such that for all k > 1 and all d> 1 

r(l - p k ,k ) 

Pk,k—d.iPk,k+d S (1 ^ S 

Let T be the first hitting time of a state at most a, starting 
from Xi > b. Let £ = b — a. Then there is a constant c > 0 
such that 

Pr (T < 2 ce/r ^j = 2~ n(e/r) . 

The proof of Theorem [5] applies Theorem [6] with respect 
to the number of zeros on an interval of [0, n e ^ 2 ]. 


Proof of Theorem [3 We only give a proof for global muta¬ 
tions; the same analysis goes through for local mutations 
with similar, but simpler calculations. 

Let pkj be the probability that SSWM will make a tran¬ 
sition from a search point with k ones to one with j ones. 
We start by pessimistically estimating transition probabili¬ 
ties and applying the negative drift theorem with regards to 
pessimistic transition probabilities p' k j defined later. The 
drift theorem will be applied, taking the number of zeros as 
distance function to the optimum. Our notation refers to 
numbers of ones for simplicity. Throughout the remainder 
of the proof we assume k > n — n e ^ 2 . 

From Lemma [2] and every 1 < j < n — k we have 


Pk,k+j < 


1.14 / n — k 


■ 1- 


•Pfix(j) 


1.14 / n — k 


* J- 


•Pfix(j) 


(4) 


< 


1.14 ■ (n e/2 ^ -pfix(j). 


Cf. Lemma |T] we estimate Pfix(j) by Pfi x (j) < 1 _^- 2 N$j ■ 
This gives 


Pk,k+j < (n e/2 




l - e- 2N Pi 


' Pk,k-\-j • 


The expected drift towards the optimum, A + (fc), is then 
bounded as follows 


n—k 

A + (fc) < J2j-p'k,k+j 

3 = 1 
n, — k 


<j> („•'’-)' 


3 = 1 


3 Pj 

1 _ e —2N/3j 


< - 

“ 1- 


w _ 

e -2N/3 


oo 

E"-(»" 2 ")’- 


Using JJjli j 2 ' xJ = — *(4 + 5*) for 0 < x < 0.09 

(this holds for large enough n as x = n e ^ 2_1 = o(l)) as well 
as N/3 > 1 


< 


3/3 


, £ / 2 - 


1 — e~ 


- 1 - (l + Sn^ 2 - 1 ). 


On the other hand, 


Pk,k -1 > 


1 - 


'Pfix(-l) > 


■ Pfix( 1 ) 


-isAll. 


using e 2Nf) < e (1 “ s)ln " = n 1 - £ 


> 


2/3 • n e 


(l - n e/2 ^ := p'k tk 


We further define p' k ,k-j := 0 for i > 2. The expected 
increase in the number of ones at state k, denoted A'(fc), 
with regards to the pessimistic Markov chain defined by p' k j 
is hence at most 

A'(fc) 


< 


^ ^ ^ j ' Pk,k+j Pk,k— 

3 =1 

_W 2 " 1 


1 — e -2 
= 2/3 ■ n e/2 ~ 1 ■ 


(l + 5n e/2_1 ) 

3 1 + 5n e/2_1 
2 ’ 1 - e“ 2 


2/3 • n e 




,*/ 2 


■ (l — n e/2_1 ) 


= -U(^-n e_1 ). 

Now, the self-loop probability for the pessimistic Markov 
chain is at least p ' k > 1 - YZjZi Pk,k+j ~ Pk,k -1 > 1 - 

J2^Zi j ' Pk,k+j ~ Pk,k- 1 = 1 ^ CK/Sn 6 ' 1 ), hence the first 
condition of the drift theorem is satisfied. 

The second condition on exponentially decreasing transi¬ 
tion probabilities follows from p k ,k-l < 1 — p'k,ki Pk,k-j = 0 
for j > 2 and, for all j £ N, 


Pk : k+j 


= (»"-)“ 


3 fti 


< 


1 -e 2 V -U - 

multiplying by p'k,k-i/Pk,k-i 




3 pj 


1 Pk,k -1 ' 


— Pk,k -1 - 




3/3.7 


. (1 _ n e/ 2 -l) 
en ' ' 


en 

,-^/ 2 


1 — n £ / 2-1 2 1 — e -2 


< p'k,k-i ■ < (1 — Pk,k) ' 2~ 

























where the penultimate inequality holds for large enough n. 
This proves the second condition for <5 := 1 and r := 2. Now 
the negative drift theorem, applied to the number of zeros, 
proves the claimed result. □ 

4. ON TRAVERSING FITNESS VALLEYS 

We have shown that with the right parameters, SSWM 
is an efficient hill climber. On the other hand, in contrast 
to the (1+1) EA, SSWM can accept worse solutions with 
a probability that depends on the magnitude of the fitness 
decrease. This is reminiscent of the Metropolis algorithm— 
although the latter accepts every improvement with proba¬ 
bility 1, whereas SSWM may reject improvements. 

Jansen and Wegener m compared the ability of the 
(1+1) EA and a Metropolis algorithm in crossing fitness 
valleys and found that both showed similar performance on 
smooth integer functions: functions where two Hamming 
neighbours have a fitness difference of at most 1 |101 Sec¬ 
tion 6 ]. 

We consider a similar function, generalising a construc¬ 
tion by Jagerskiipper and Storch [7|: the function Cliffy 
is defined such that non-elitist algorithms have a chance to 
jump down a “cliff” of height roughly d and to traverse a 
fitness valley of Hamming distance d to the optimum (see 
Figure [TJ. 


fitness 



Figure 1: Sketch of the function Cliffy. 


Definition 2 
CLiFF d (a:) 
where |x|i = E"=i 


(Cliff). 

J |a:|i if |a:|i <n — d 

| |a;|i — d + 1 otherwise 

Xi counts the number of ones. 


The (1+1) EA typically optimises Cliffy through a direct 
jump from the top of the cliff to the optimum, which takes 
expected time Q(n d ). 

Theorem 7. The expected optimisation time of the 
(1+1) EA on CLiFFd, for 2 < d < n/2, is Q(n d ). 

In order to prove Theorem [TJ the following lemma will 
be useful for showing that the top of the cliff is reached 
with good probability. More generally, it shows that the 
conditional probability of increasing the number of ones in 
a search point to j, given it is increased to some value of j 
or higher, is at least 1 / 2 . 

Lemma 8. For all 0 < i < j < n, 

mut(f, j) 1 

Efe=j mut (h fc ) “ 2 ' 


The proof of this lemma is presented in the appendix. 

Proof of Theorem [?] From any search point with i < n — d 
ones, the probability of reaching a search point with higher 
fitness is at least The expected time for accept¬ 

ing a search point with at least n — d ones is at most 
E ^" 1 rFh = O(nlogn). Note that this is 0(n d ) since 
d > 2. 

We claim that with probability 0(1), the first such search 
point has n—d ones: with probability at least 1/2 the initial 
search point will have at most n — d ones. Invoking Lemma[ 8 ] 
with j -.— n — d, with probability at least 1/2 the top of the 
cliff is reached before any other search point with at least 
n — d ones. 

Once on the top of the cliff the algorithm has to jump 
directly to the optimum to overcome it. The probability of 
such a jump is Aj (1 — i)" _d and therefore the expected time 
to make this jump is Q(n d ). □ 

SSWM with global mutations also has an opportunity to 
make a direct jump to the optimum. However, compared 
to the (1+1) EA its performance slightly improves when 
considering shorter jumps and accepting a search point of 
inferior fitness. The following theorem shows that for large 
enough cliffs, d = cn(logn), the expected optimisation time 
is by a factor of e n ^ smaller than that of the (1+1) EA. 
Although both algorithms need a long time for large d, the 
speedup of SSWM is significant for large d. 

Theorem 9. The expected optimisation time of SSWM 
with global mutations and /3 = 1,7V = lln(lln) on Cliffy 
with d = ui(logn) is at most n d /e n ^ d ' > . 

Proof. We define R as the expected time for reaching a 
search point with either n — d or n ones, when starting 
with a worst possible non-optimal search point. Let T pea k 
be the random optimisation time when starting with any 
search point of n — d ones, hereinafter called a peak. Then 
the expected optimisation time from any initial point is at 
most R + E (Tpeak)- Let Psuccess be the probability of SSWM 
starting in a peak will reach the optimum before reaching a 
peak again. We call such a time period a trial. After the 
end of a trial, taking at most R expected generations, with 
probability 1 — p SU ccess SSWM returns to a peak again, so 

E (Tpeak) < R + (1 — Psuccess) ' E (T pe ak) 

E (Tpeak) < -—-■ (5) 

^success 

We first bound the worst-case time to return to a peak or a 
global optimum as R = 0(n log n). Let Si be the set of all 
search points with at most n — d ones and S 2 := {0, l } 71 \ Si. 
As long as the current search point remains within S 2 , 
SSWM essentially behaves like on OneMax. Repeating 
arguments from the proof of Theorem [I] in expected time 
0((n log n)//3) = 0(n log n) (as here /3 = 1) SSWM either 
finds a global optimum or a search point in Si. Likewise, as 
long as the current search point remains within Si, SSWM 
essentially behaves like on OneMax and within expected 
time 0(n log n) either a peak or a search point in S 2 is found. 

SSWM can switch indefinitely between Si and S 2 within 
one trial, as long as no optimum or peak is reached. The con¬ 
ditional probability of creating a peak—when from a search 

























































point with i < n — d ones either a peak or a non-optimal 
search point in S 2 is reached—is 

mut(i, n — d) ■ Pfl x (n — d — i) mut(i,/) 

YslZn-d. mut (b fc) • Pflx(fc -i-d + 1/2) “ J2k=j mut (b fc ) 

as Pa x (n—d—i) > pa x (k—i— d+1/2) for all n— d < k < n. By 
Lemma [5] the above fraction is at least 1/2. Hence SSWM 
in expectation only makes 0(1) transitions from Si to S2, 
and the overall expected time spent in Si and S 2 is at most 
R = 0(1) ■ 0(nlog n). 

The remainder of the proof now shows a lower bound on 
Psuccess, the probability of a trial being successful. A suffi¬ 
cient condition for a successful trial is that the next mutation 
creates a search point with n — d + k ones, for some integer 
1 < k < d, that this point is accepted, and that from there 
the global optimum is reached before returning to a peak. 

We estimate the probabilities for these events separately 
in order to get an overall lower bound on the probability of 
a trial being successful. 

From any peak there are ( d ) search points at Hamming 
distance k that have n— d + k ones. Considering only such 
mutations, the probability of a mutation increasing the num¬ 
ber of ones from n — d by k is at least 

mut(n — d,n — d + k) > - ( 1 — — 

n k \ n 



least k/2 bits is at most l/((ln n)/(2e))! = n n< - logrl \ so the 
probability that this happens in expected time 0(n log n) is 
still n- n(log ”). 

Assuming such jumps do not occur, we can then use drift 
bounds from the analysis of OneMax for states with at least 
a ones. From the proof of Theorem Q] and Q we know that 
the drift at i ones for /3 = 1 is at least 


Am > a ( 2 ^). 


Let pij denote the transition probability from a state with 
i ones to one with j ones. The probability of decreasing the 
current state is at most Pr x (— 1) = 0(l/n) due to ([2|l. The 
probability of increasing the current state is at most ( n—i)/n 
as a necessary condition is that one out of n — i zeros needs 
to flip. Hence for i < b, which implies n — i = cj( 1), the 
self-loop probability is at least 


p%,i >1 — 0 




= 1-0 



Together, we get A (i) > fl(l — Pi,i), establishing the first 
condition of Theorem [S] 

Note that pfl x (l) = \Z.i/ n — fi(l), hence 


1 ~Pi,i > Pi,i +1 > 


n — i 
en 


' Pfix (1) — 



( 6 ) 



The second condition follows for improving jumps from i to 
i + j, j > 1, from Lemma[2]and ©: 


The probability of accepting such a move is 

e 2/3(d—fc—1/2) _ 1 e 2(d—fc—1/2) _ 1 

Pfix(fc ~ d + 1/2) = e 2 N/3 ( d _ k _i /2) __ l ^ (Hn)(d-fc-i/2) • 

We now fix k := [d/e J and estimate the probability of mak¬ 
ing and accepting a jump of length k: 

mut(n — d,n — d + k) ■ p& x (k — d + 1/2) 


J_ U\ k e 2(d-*-l/2) - 1 
— en k \k) (11 n)0~ k ~ 1 / 2 ) 



Finally, we show that, if SSWM does make this accepted 
jump, with high probability it climbs up to the global opti¬ 
mum before returning to a search point in Si. To this end 
we work towards applying the negative drift theorem to the 
number of ones in the interval [a := \n — d + k/2\,b : = 
n — d + k] and show that, since we start in state b, a state a 
or less is unlikely to be reached in polynomial time. 

We first show that the drift is typically equal to that on 
OneMax. For every search point with more than a ones, 
in order to reach Si, at least k/2 bits have to flip. Until 
this happens, SSWM behaves like on OneMax and hence 
reaches either a global optimum or a point in Si in expected 
time 0(n log n). The probability for a mutation flipping at 





1 

/! 


•Pflx(j) < 


n — i 1 
n j\ 


< (1 ~Pi,i)- 


0 ( 1 ) 

2 J 


For backward jumps we get, for 1 < j < k/2, and n large 
enough, 


Pi,i—j Pfix( j) < 




3 


e 2N i - 1 (liny - 1 


< 2 ~ 


Now Theorem [S] can be applied with r = 0(1) and 5 = 1 
and it yields that the probability of reaching a state of a or 
less in n steps is n~ u ^. 

This implies that following a length-A; jump, a trial is 
successful with probability 1 — This establishes 

Psuccess ■= n (n~ d+1/2 ■ • Plugging this into <(5j), 

adding time R for the time to reach the peak initially, and 
using that 0(n/! 2 logn) • (9/10) d = e _n( - d ) f or ^ _ w (logn) 
yields the claimed bound. □ 


5. SSWM OUTPERFORMS (1+1) EA ON 
BALANCE 

Finally, we investigate a feature that distinguishes SSWM 
from the (1+1) EA as well as the Metropolis algorithm: the 
fact that larger improvements are more likely to be accepted 
than smaller improvements. 

To this end, we consider the function Balance, originally 
introduced by Rohlfshagen, Lehre, and Yao [18] as an exam¬ 
ple where rapid dynamic changes in dynamic optimisation 
can be beneficial. The function has also been studied in the 
context of stochastic ageing by Oliveto and Sudholt [la¬ 
in its static (non-dynamic) form, Balance can be illus¬ 
trated by a two-dimensional plane, whose coordinates are 
determined by the number of leading ones (LO) in the first 


















n 2 • LO(a) 
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n • LO(a) + |6|i 

n 3 

n 2 ■ LO(a) 

0 



LO(a) 


Figure 2: Visualisation of Balance [18] ■ 


half of the bit string, and the number of ones in the second 
half, respectively. The former has a steeper gradient than 
the latter, as the leading ones part is weighted by a factor 
of n in the fitness (see Figure [2j. 


Definitions (Balance [THj ). Let a,b e {0,l} n/2 
and x = ab £ {0, l} n . Then, Balance(:e) = 


|6|i + n ■ LO(a) 
I n 2 • LO(a) 

[o 


i/LO(a) = n/2, else 
if n/16 < |6|i < 7n/16, else 
if\a\o > else 
otherwise. 


where \x\i = 'Y/iL\xi, |:r|o is a number of zeros and 
lo(s) : = Xj counts the number of leading ones. 


The function is constructed in such a way that all points 
with a maximum number of leading ones are global optima, 
whereas increasing the number of ones in the second half 
beyond a threshold of 7n/16 (or decreasing it below a sym¬ 
metric threshold of n/16) leads to a trap, a region of local 
optima that is hard to escape from. 

Rolilfshagen, Lehre, and Yao [1'S I Theorem 3] showed the 
following lower bound for the (1+1) EA, specialised to non¬ 
dynamic optimisation: 

Theorem 10 (HHD- The expected optimisation time of 
the (1+1) EA on Balance is n n + 1/2 ). 

We next show that SSWM with high probability finds an 
optimum in polynomial time. For appropriately small (3 we 
have sufficiently many successes on the LO-part such that we 
find an optimum before the ONEMAX-part reaches the region 
of local optima. This is because for small (3 the probability 
of accepting small improvements is small. The fact that 
SSWM is slower than the (1+1) EA on OneMax by a factor 
of 0(l//3) turns into an advantage over the (1+1) EA on 
Balance. 

The following lemma shows that SSWM effectively uses 
elitist selection on the LO-part of the function in a sense that 
every decrease is rejected, with overwhelming probability. 

Lemma 11. For every x = ab with n/16 < |6|i < 7n/16 
and (3 = n~ 3 ^ 2 and N/3 = Inn, the probability of SSWM 
with local or global mutations accepting a mutant x' = a'b' 
with LO(a') < LO(a) and n/16 < 16'11 < 7n/16 is 0(n~ n ). 


Proof. The loss in fitness is at least n — (|6'|i — |6|i) > n/2. 
The probability of SSWM accepting such a loss is at most 

1 _ e —2/3( —n/2) e 20(n/2) 

Pfix(— n/2) < 1 _ e -2Np(-n/2) — e 2Np(n/2) _ j ' 


Assuming (3 = n 3 ^ 2 and N/3 = In n, this is at most 


e n , e 


n n — 1 n" — 1 


0(n~ n ). □ 


The following lemma establishes the optimisation time of 
the SSWM algorithm on either the OneMax or the LO-part 
of Balance. 

For global mutations we restrict our considerations to rel¬ 
evant steps, defined as steps where no leading ones in the 
first half of the bit string is flipped. The probability of a 
relevant step is always at least (1 — l/n) n ^ 2 ~ e -1 ^ 2 . When 
using local mutations, all steps are defined as relevant. 


Lemma 12. Let (3 = n 3 ^ 2 and N/3 = Inn. With prob¬ 
ability 1 — e~ Q ^ n \ SSWM with either local or global mu¬ 
tations either optimises the LO pari or reaches the trap (all 
search points with fitness n 2 ■ LO(a)j within 



1 

Pfix(n - y / n ) 


(l+n 1/4 ^ 


relevant steps. 


Proof. Consider a relevant step, implying that global mu¬ 
tations will leave all leading ones intact. With probability 
1/n a local or global mutation will flip the first 0-bit. This 
increases the fitness by k ■ n — Aom, where Aom is the dif¬ 
ference in the ONEMAX-value of b caused by this mutation 
and k is the number of consecutive 1-bits following this bit 
position after mutation. The latter bits are called free riders 
and it is well known (see QJ, Lemma 1 and proof of The¬ 
orem 2]) that the number of free riders follows a geometric 
distribution with parameter 1/2, only capped by the number 
of bits to the end of the bit string a. 

The probability of flipping at least y/n bits in one global 
mutation is at most 1 /(y/n)\ = and the proba¬ 

bility that this happens at least once in T relevant steps 
is still of the same order (using that T = poly (n) as 
Pax(n — y/n) > 1/N > 1/poly (n)). We assume in the fol¬ 
lowing that this does not happen, which allows us to as¬ 
sume Aom < y/n. We also assume that the number of lead¬ 
ing ones is never decreased during non-relevant steps as the 
probability of accepting such a fitness decrease is 0(n~ n ) by 
Lemma nn and the expected number of non-relevant steps 
before T relevant steps have occurred is 0(T). 

We now have that the number of leading ones can never 
decrease and any increase by mutation is accepted with 
probability at least pn x (n — y/n). In a relevant step, the 
probability of increasing the number of leading ones is hence 
at least 1/n • pn x (n — y/n) and the expected number of such 
improvements in 


T := 


(1 + n 1/4 ) 


4 Pfi x (n - y/n) 

relevant steps is at least n/4 + n 3 ^ 4 /4. By 
bounds [5], the probability that less than n/4 + n 


Chernoff 
3/4 /8 im¬ 


provements happen is e n< - n 1 \ Also the probability that 
during this number of improvements less than n/4 — n 3 / 4 /8 

free riders occur is e - ^ 71 - 1 . If these two rare events do not 
happen, a LO-value of n/2 is reached before time T. Taking 
the union bound over all rare failure probabilities proves the 
claim. □ 















We now show that the OneMax part is not optimized 
before the LO part. 

Lemma 13. Let f} = n~ 3 ^ 2 , N/3 = In n, and T be as in 
Lemma m The probability that SSWM starting with aobo 
such that n/4 < | £>o 1 1 < n/4 + n 3 ^ 4 creates a search point 
ab with 16|i < n/16 or |fc|i > 7n/16 in T relevant steps is 

e -n (n 1 / 2 ). 


Now, by Chernoff bounds, the probability of having more 
than S := (1 + n^ 1 ^ 4 ) • p + ■ T improving steps in T rele¬ 
vant steps is e~ n ^ n \ Using a Chernoff bound for geo¬ 
metric random variables 0 Theorem 1.14], the probability 
of S improving steps yielding a total progress of at least 
(1 + n" 1/4 ) • 4/3 • S is e - n( " 1/2) . 

If none of these rare events happen, the progress is at most 


It will become obvious that in T relevant steps SSWM typ¬ 
ically makes a progress of 0(n ) on the OneMax part. The 
proof of Lemma [T3] requires a careful and delicate analysis 
to show that the constant factors are small enough such that 
the stated thresholds for |i>|i are not surpassed. 


Proof of Lemma\T3\ We only prove that a search point with 
16| i > 7n/16 is unlikely to be reached with the claimed prob¬ 
ability. The probability for reaching a search point with 
| fo| i < n/16 is clearly no larger, and a union bound for these 
two events leads to a factor of 2 absorbed in the asymptotic 
notation. 

Note that for (5 = n~ 3 ^ 2 we have 


Pflx (n -V^)> , + > 2^ ■ (1 - 0(n^ 2 )). 


Hence 
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We call a relevant step improving if the number of ones in b 
increases and the step is accepted. 

We first consider only steps where the number of leading 
ones stays the same. Then the probability that the OneMax 
value increases from k by j, adapting Lemma [2] to a string 
of length n/ 2 , is at most 


Pj < 




1.14 

"j - ' Pfix 

r- 


(j) 


using n/2 — k < n/4 


< 

< 


1.14 • A~ J . . 1.14 • A~ J 

-71-PfixU) < -7i- 


2.28/3 ■ 4” 


1 


1 _ e -2JV/3j 


-■Pj- 


2 Pj 

1 - e- 2N Pi 


In the following, we work with pessimistic transition proba¬ 
bilities Pj. Note that for all j > 1 

„ -I -2NB 

El _ 4 - 0 - 1 ) 1 - e 4 - 0 - 1 ) 

pi 1 - e- 2JV « - 

Let p + denote (a lower bound on) the probability of an im¬ 
proving step, then 


oo oo . 

P + < ^Pj < Pi ■ 4 _0_1) =pi • -. 


The conditional probability of advancing by j, given an im¬ 
proving step, is then 


El < 4 - 0 - 1 ). EL 

p+ ~ p+ 



which corresponds to a geometric distribution with param¬ 
eter 3/4. 


(1 + 0 (n“ 1/4 )) • | • p+• T 
= (l + 0(n- 1 / 4 )).^.p 1 -T 

<(1 + 0(n- 1 / 4 )).Hl.n. 


We also have at most n/2 steps where the number of lead¬ 
ing ones increases. If the number of leading ones increases 
by 5 > 1, the fitness increase is Sn + |6'|i — 16| 1 . Hence the 
above estimations of jump lengths are not applicable. We 
call these special steps; they are unorthodox as the large 
fitness increase makes it likely that any mutation on the 
OneMax part is accepted. We show that the progress on 
the OneMax part across all special steps is 0(n 3 ' 4 ) with 
high probability. 

We grant the algorithm an advantage if we assume that, 
after initialising with |6|i > n/4, no search point with 
|6|i < n/4 is ever reachecQ. Under this assumption we al¬ 
ways have at least as many 1-bits as 0-bits in b, and mutation 
in expectation flips at least as many 1-bits to 0 as 0-bits to 
1 . 

Then the progress in | 6 |i in one special step increasing 
the number of leading ones by d can be described as follows. 
Imagine a matching (pairing) between all bits in b such that 
each pair contains at least one 1-bit. Let X t denote the ran¬ 
dom change in 16| 1 by the i-th pair. If the pair has two 1-bits, 
Xi < 0 with probability 1. Otherwise, we have X t = 1 if 
the 0 -bit in the pair is flipped, the 1 -bit in the pair is not 
flipped, and the mutant is accepted (which depends on the 
overall | 6 |i-value in the mutant). The potential fitness in¬ 
crease is at most dn + n /2 as the range of | 6 |i-values is n/ 2 . 
Likewise, we have Xi = —1 if the 0-bit is not flipped, the 
1 -bit is flipped, and the mutant is accepted (which again de¬ 
pends on the overall | 6 |i-value in the mutant). The fitness 
increase is at least dn — n/2. With the remaining probability 
we have Xi = 0. Hence for global mutations (for local mu¬ 
tations simply drop the 1 — 1 /n term) the total progress in 
a special step increasing LO(a) by d is stochastically domi¬ 
nated by a sum of independent variables Yi, ..., Y n u where 
Pr (Yi = ±1) = 1/n • (1 — 1/n) ■ pa x (dn ± n/2) and Yi = 0 
with the remaining probability. 

There is a bias towards increasing the number of ones due 
to differences in the arguments of pfi x : E (Vi) = 1/n • (1 — 
1/n) • (pfix (dn + n/2) — pr x (dn — n/2)). Using the definition 
of pfl x and preconditions /3 = n ~ 3 ^ 2 , N/3 = Inn, the bracket 


1 Otherwise, we restart our considerations from the first 

point in time where | 6 |i > n/4 again, replacing T with the 

number of remaining steps. With overwhelming probability 

we will then again have | 6 |i < n/4 + n 3 ^ 4 . 










is bounded as 

pa* (dn + n/2) - pn x (dn - n/2) 


1 

- e 

— 2 dn 

-V 2 -n- 

1/2 

1 — e 

2 dn 

-l/2 +n -l/2 


i 

— n~ 

-2dn-\-n 


i - 

- n~ 

2 dn — n 

= (1 

+i 

o(i)) 

((>-* 

— 2 dn 

-H 2 -n- 

1/2 ^ 


= (1 

+i 

0 (1))' 

— 2 dn 

■ e 

-1/2 ^ 

e"” 1/2 

- e~ 


< (1 

+ i 

0 (1))' 

— 2 dn~ 

• e 

-1/2 

(1 + 2 n 

-1/2 

e 

l 

= (1 

+ > 

0 (1))' 

— 2 dn~ 

■ e 

-1/2 

3 n~ 1/2 




where in the last inequality we have used 1 + x < e x for all x 
and e x < 1 + 2x for 0 < x < 1. 

Note that the expectation, and hence the bias, is largest 
for d = 1, in which case we get, using e~ 2dn 1 < 

e~ 2rl ~ 1/2 < 1, 

E (Y) < (1 + o(l)) • l/n ■ (1 - 1/n) • 3n“ 1/2 < 4n“ 3/2 
for n large enough. 

The total progress in all m special steps is hence stochasti¬ 
cally dominated by a sequence of m-n/4 random variables Y, 
as defined above, with d := 1. Invoking Lemma flGl stated in 
the appendix, with S := n 3 ^ 4 , the total progress in all special 
steps is at most 8 + m - n/ 4 • E (Yi) = 5 + 0(n 4 / 2 ) = 0(n 3 ^ 4 ) 
with probability 1 — / \ 

Hence the net gain in the number of ones in all special 
steps is at most n 3 / 4 + 0(mn/4 ■ n ~ 3 / 2 ) = 0(n 3 ^ 4 ) with 
probability 1 — e~ n(n / K 

Together with all regular steps, the progress on the 
OneMax part is at most 1.14n/9 + 0(n 3,/4 ), which for large 
enough n is less than the distance 7n/16 — (n/4 + n 3 ^ 4 ) to 
reach a point with |fc|i > 7n/16 from initialisation. This 
proves the claim. □ 

Finally, we put the previous lemmas together into our 
main theorem that establishes that SSWM can optimise 
Balance in polynomial time. 

Theorem 14. With probability 1 — e~ n< - n 1 ^ SSWM with 
f3 = n ~ 3 / 2 and N/3 = Inn optimises Balance in time 
0(n//3) =0{n 5/2 ). 

Proof. By Chernoff bounds, the probability that for the 
initial solution xo = aofco we have n/4 — n 3 ^ 4 < |foo11 < 
n /4 + n 3 ^ 4 is 1 — e~ n ^ n K We assume pessimistically that 
n/4 < | foo 11 < n/4 + n 3 ^ 4 . Then Lemma [TUI is in force, and 
with probability 1 — e ~ n l n ) within T relevant steps, T 
as defined in Lemma m SSWM does not reach a trap or 
a search point with fitness 0. Lemma [12] then implies that 
with probability 1 — e _r ^ n 1 > an optimal solution with n/2 
leading ones is found. 

The time bound follows from the fact that T = 0(n//3) 
and that, again by Chernoff bounds, we have at least T 
relevant steps in 3 T iterations of SSWM, with probability 
i-e - ^" 172 ). □ 

6. CONCLUSIONS 

The field of evolutionary computation has matured to the 
point where techniques can be applied to models of natural 


evolution. Our analyses have demonstrated that runtime 
analysis of evolutionary algorithms can be used to analyse a 
simple model of natural evolution, opening new opportuni¬ 
ties for interdisciplinary research with population geneticists 
and biologists. 

Our conclusions are highly relevant for biology, and open 
the door to the analysis of more complex fitness landscapes 
in this held and to quantifying the efficiency of evolutionary 
processes in more realistic scenarios of evolution. One inter¬ 
esting aspect of our results is that they impose conditions 
on population size (TV) and strength of selection (/3) which 
represent fundamental limits to what is possible by natural 
selection. We hope that these results may inspire further 
research on the similarities and differences between natural 
and artificial evolution. 

From a computational perspective, we have shown that 
SSWM can overcome obstacles such as posed by Cliffy and 
Balance in different ways to the (1+1) EA, due to its non- 
elitistic selection mechanism. We have seen how the prob¬ 
ability of accepting a mutant can be tuned to enable hill 
climbing, where fitness-proportional selection fails, as well 
as tunnelling through fitness valleys, where elitist selection 
fails. For Balance we showed that SSWM can take advan¬ 
tage of information about the steepest gradient. The selec¬ 
tion rule in SSWM hence seems to be a versatile and useful 
mechanism. Future work could investigate its usefulness in 
the context of population-based evolutionary algorithms. 
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APPENDIX 

This appendix contains proofs that were omitted from the main part. 


Lemma 15. pa x is monotonic for all N >1 and strictly increasing for N > 1 




Proof. If TV = 1, pfl x (/3A/) = 1. In order to show that pfi x (A/) is monotonically increasing we show that J 

1 ^l- 2 N//Af ~ N- — ( 1 ^ e - 2 A^)A f )2 / ’ > > 0 for all A/. For f)Af > 0 and N > 1, we have g _2 ^ A f < 1, and e~ 2l3Af > e ~ 2N i 3A f . 


For PAf < 0, the inequalities are reversed. If /3A/ > 0: 


2 e" 2 ^ _ e -2iV/3A/ (1 _ e -2/3A/ ) 
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e -2A/3A/ ^ l _ g-2/3A/ ' 


— 4 p 2_A J -| — £, p AA J 

Since f'-'iNS a/ > 1 and *[[[ ^AJ 1 < 1 this proves the claim for fdAf > 0. For /JA/ < 0 all the inequalities are reversed and 


1 —e 

g —2j3A/ 1 _ e -2iV/3A/ 

-2JV/3A / < 1 an( l 1 _ e -2flA/ > 1- 
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Proof of Lemma [3 We follow the proof of Lemma 2 in [20J. An offspring with i + k 1-bits is created if and only if there is an 
integer j £ No such that j 1 -bits flip and k + j 0 -bits flip. 
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It is easy to see that — \ f° r a h as the maximum is attained for i = j — Hence we get an upper bound of 
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The proof for mutations decreasing the number of ones follows immediately due to the symmetry mut(i, i — k) = mut(n — 
i,n — i + k). □ 


Proof of Lemma\8\ The proof consists of two parts: 

1) The probability of improving by j — i = k bits is at least twice as large as the probability of improving by k + 1 bits, i.e. 
mut(i, i + k) > 2 mut(i, i + k + 1 ) for any 0 < i < j < n. 

2) We use 1) to prove that 

Part 1) The probability to improve by k bits is 


mut(i, i + k) — 


l \k + l 


k + 2l 
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while the probability to improve by k + 1 bits is 


mit(i, i + k + 1 ) = 

1=0 


l J \ k + l + 1 ] \ti 
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We want to show that the following is true 

mut(i, i + k) > 2 mut(i, i + k + 1 ) •<=> 
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This holds if following holds for any 0 < l < n 

1 


> 0 


(n — i — k — l) (n — 1 )(k + l + 1) 

(n — 1 )(k + l + 1) > 2 (n — i — k — l). 

Which is true for any k > 1 (thus for any 0 < i < j <n). 

Part 2) Using the above inequality mut(i,i + k) > 2mut (i,i + k + 1) we can bound every possible improvement better 
than k from above by 


mut(i, i + k + l) < ( — J mut(i, i + k) 


for any 0 <l<n — i — k. This can also be written as 


mut (i,j + l) < ( ^ ) mut(«, j) 


for any 0 < l < n — j. This leads to 
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which proves Lemma [ 8 ] 

□ 


Lemma 16. Consider independent random variables Yi,... ,Yt where 

[ 1 with probability p 
Yi ~ < 0 with probability 1 — p — r 
1—1 with probability r 

then for Y = X]i=i ^ we have E (T) = t(p — r) and for every 0 < 5 < t(p + r) 

P(Y > E(Y) + 5) < e- Q(t(p+r)) + e _n ( t(p+r> ) . 

Proof. We imagine Y t to be drawn in a two-step process: in a first draw with probability 1—p — r we set V, = 0. Otherwise, we 
have Yi ^ 0 and a second random experiment determines whether Yi = 1 or Yi = — 1. Define indicator variables X\ € (0, 1} 
for the first experiment: Xj » 1 if Yi ^ 0. Then X = gives the number of events where Yi ^ 0. Furthermore, let 

Zj £ ( — 1, +1} be the outcome of the j-th instance of the second-type experiment (such an experiment only happens when 

















the first draw determined Y ^ 0), and Z = Zj be the sum of these variables. Since Z, in comparison to Y, excludes all 

summands of value 0, we have Z = Y and hence E (Z) = E (Y) = t(p — r). 

Is easy to see that (X < 2E (X)) A (Z < E (Z) + 6 | X < 2E ( X )) => (Y < E (Y) + 5) therefore 

P(Y > E (Y) + 6) < P(X > 2E (X)) + P(Z > E (Z) + S \ X < 2E (X)) 

Now we apply a Chernoff bound to X and a Hoeffding bound to Z for X < 2E (A') variables: 

< e “l E ( X ) _|_ e ~ 4E(X) 

= e -n(B(x» +e " n (Bm) 

_ e -n(i(p+r)) _|_ e ~ Q (t(P+r)') Q 



